16 research outputs found
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
Translating verbose information needs into crisp search queries is a
phenomenon that is ubiquitous but hardly understood. Insights into this process
could be valuable in several applications, including synthesizing large
privacy-friendly query logs from public Web sources which are readily available
to the academic research community. In this work, we take a step towards
understanding query formulation by tapping into the rich potential of community
question answering (CQA) forums. Specifically, we sample natural language (NL)
questions spanning diverse themes from the Stack Exchange platform, and conduct
a large-scale conversion experiment where crowdworkers submit search queries
they would use when looking for equivalent information. We provide a careful
analysis of this data, accounting for possible sources of bias during
conversion, along with insights into user-specific linguistic patterns and
search behaviors. We release a dataset of 7,000 question-query pairs from this
study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape
ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters
To bridge the gap between the capabilities of the state-of-the-art in factoid
question answering (QA) and what users ask, we need large datasets of real user
questions that capture the various question phenomena users are interested in,
and the diverse ways in which these questions are formulated. We introduce
ComQA, a large dataset of real user questions that exhibit different
challenging aspects such as compositionality, temporal reasoning, and
comparisons. ComQA questions come from the WikiAnswers community QA platform,
which typically contains questions that are not satisfactorily answerable by
existing search engine technology. Through a large crowdsourcing effort, we
clean the question dataset, group questions into paraphrase clusters, and
annotate clusters with their answers. ComQA contains 11,214 questions grouped
into 4,834 paraphrase clusters. We detail the process of constructing ComQA,
including the measures taken to ensure its high quality while making effective
use of crowdsourcing. We also present an extensive analysis of the dataset and
the results achieved by state-of-the-art systems on ComQA, demonstrating that
our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201
Conversational Question Answering over Passages by Leveraging Word Proximity Networks
Question answering (QA) over text passages is a problem of long-standing
interest in information retrieval. Recently, the conversational setting has
attracted attention, where a user asks a sequence of questions to satisfy her
information needs around a topic. While this setup is a natural one and similar
to humans conversing with each other, it introduces two key research
challenges: understanding the context left implicit by the user in follow-up
questions, and dealing with ad hoc question formulations. In this work, we
demonstrate CROWN (Conversational passage ranking by Reasoning Over Word
Networks): an unsupervised yet effective system for conversational QA with
passage responses, that supports several modes of context propagation over
multiple turns. To this end, CROWN first builds a word proximity network (WPN)
from large corpora to store statistically significant term co-occurrences. At
answering time, passages are ranked by a combination of their similarity to the
question, and coherence of query terms within: these factors are measured by
reading off node and edge weights from the WPN. CROWN provides an interface
that is both intuitive for end-users, and insightful for experts for
reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data,
where it achieved above-median performance in a pool of neural methods.Comment: SIGIR 2020 Demonstration
CompMix: A Benchmark for Heterogeneous Question Answering
Fact-centric question answering (QA) often requires access to multiple,
heterogeneous, information sources. By jointly considering several sources like
a knowledge base (KB), a text collection, and tables from the web, QA systems
can enhance their answer coverage and confidence. However, existing QA
benchmarks are mostly constructed with a single source of knowledge in mind.
This limits capabilities of these benchmarks to fairly evaluate QA systems that
can tap into more than one information repository. To bridge this gap, we
release CompMix, a crowdsourced QA benchmark which naturally demands the
integration of a mixture of input sources. CompMix has a total of 9,410
questions, and features several complex intents like joins and temporal
conditions. Evaluation of a range of QA systems on CompMix highlights the need
for further research on leveraging information from heterogeneous sources
CROWN: Conversational Passage Ranking by Reasoning over Word Networks
Information needs around a topic cannot be satisfied in a single turn; users
typically ask follow-up questions referring to the same theme and a system must
be capable of understanding the conversational context of a request to retrieve
correct answers. In this paper, we present our submission to the TREC
Conversational Assistance Track 2019, in which such a conversational setting is
explored. We propose a simple unsupervised method for conversational passage
ranking by formulating the passage score for a query as a combination of
similarity and coherence. To be specific, passages are preferred that contain
words semantically similar to the words used in the question, and where such
words appear close by. We built a word-proximity network (WPN) from a large
corpus, where words are nodes and there is an edge between two nodes if they
co-occur in the same passages in a statistically significant way, within a
context window. Our approach, named CROWN, improved nDCG scores over a provided
Indri baseline on the CAsT training data. On the evaluation data for CAsT, our
best run submission achieved above-average performance with respect to AP@5 and
[email protected]: TREC 2019, 14 page
Explainable Conversational Question Answering over Heterogeneous Sources via Iterative Graph Neural Networks
In conversational question answering, users express their information needs
through a series of utterances with incomplete context. Typical ConvQA methods
rely on a single source (a knowledge base (KB), or a text corpus, or a set of
tables), thus being unable to benefit from increased answer coverage and
redundancy of multiple sources. Our method EXPLAIGNN overcomes these
limitations by integrating information from a mixture of sources with
user-comprehensible explanations for answers. It constructs a heterogeneous
graph from entities and evidence snippets retrieved from a KB, a text corpus,
web tables, and infoboxes. This large graph is then iteratively reduced via
graph neural networks that incorporate question-level attention, until the best
answers and their explanations are distilled. Experiments show that EXPLAIGNN
improves performance over state-of-the-art baselines. A user study demonstrates
that derived answers are understandable by end users.Comment: SIGIR 2023 Research Track Long Pape
FAIRY: A Framework for Understanding Relationships between Users' Actions and their Social Feeds
Users increasingly rely on social media feeds for consuming daily
information. The items in a feed, such as news, questions, songs, etc., usually
result from the complex interplay of a user's social contacts, her interests
and her actions on the platform. The relationship of the user's own behavior
and the received feed is often puzzling, and many users would like to have a
clear explanation on why certain items were shown to them. Transparency and
explainability are key concerns in the modern world of cognitive overload,
filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a
framework that systematically discovers, ranks, and explains relationships
between users' actions and items in their social media feeds. We model the
user's local neighborhood on the platform as an interaction graph, a form of
heterogeneous information network constructed solely from information that is
easily accessible to the concerned user. We posit that paths in this
interaction graph connecting the user and her feed items can act as pertinent
explanations for the user. These paths are scored with a learning-to-rank model
that captures relevance and surprisal. User studies on two social platforms
demonstrate the practical viability and user benefits of the FAIRY method.Comment: WSDM 201